Shotgun Sequence Assembly
نویسنده
چکیده
Shotgun sequencing is the most widely used technique for determining the DNA sequence of organisms. It involves breaking up the DNA into many small pieces that can be read by automated sequencing machines, then piecing together the original genome using specialized software programs called assemblers. Due to the large amounts of data being generated and to the complex structure of most organisms’ genomes, successful assembly programs rely on sophisticated algorithms based on knowledge from such diverse fields as statistics, graph theory, computer science, and computer engineering. Throughout this chapter we will describe the main computational challenges imposed by the shotgun sequencing method, and survey the most widely used assembly algorithms.
منابع مشابه
A New Algorithm for DNA Sequence Assembly
Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given ...
متن کاملA probabilistic approach to sequence assembly validation
ABSTRACT Sequence assembly is an essential requirement for determining the complete sequence of long DNA. However, sequence assembly programs often generate misassembled contigs by either joining di erent repeat copies, resulting in joining non contiguous DNA regions (inverted or swapped) or by including many fragments from di erent repeat copies resulting in errors in the consensus sequence (n...
متن کاملSequence determination from overlapping fragments: a simple model of whole-genome shotgun sequencing.
Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general pro...
متن کاملWhole Genome Assemblies of the Drosophila and Human Genomes
Shotgun sequence assembly is a classic inverse problem: given a set of segments randomly sampled from a target sequence, the problem is to reconstruct the target. Early programs for this problem assisted a user by finding potential overlapping segments which were then assembled by hand. As the programs became progressively more sophisticated the problem was completely solved by the software but...
متن کاملSequencing and Assembly of the 22-Gb Loblolly Pine Genome
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Advances in Computers
دوره 60 شماره
صفحات -
تاریخ انتشار 2004